Model Selection

Lightweight multimodal

# Lightweight multimodal

Smolvlm Instruct GGUF

SmolVLM is a compact open-source multimodal model that can accept image and text inputs and generate text outputs. It is designed for high efficiency and is suitable for device-side applications.

Transformers English

Smolvlm2 2.2B Instruct

SmolVLM2-2.2B is a lightweight multimodal model designed for analyzing video content. It can process video, image, and text inputs and generate text outputs.

Transformers English

Uform Gen2 Qwen 500m

UForm-Gen is a small generative vision-language model primarily used for image caption generation and visual question answering.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase